Background: The iStopMM Prediction Model was developed to predict the probability of ≥10% bone marrow plasma cells (BMPC) in patients with monoclonal gammopathy of undetermined significance (MGUS), identifying those with smoldering multiple myeloma (SMM) or worse. While the model showed excellent performance in its derivation cohort, external validation is essential to confirm its clinical utility and generalizability in non-Icelandic populations.

Methods: We conducted an external validation of the iStopMM model using a retrospective cohort of patients from two US cancer centers who underwent their first bone marrow biopsy to evaluate MGUS. We excluded patients with anemia (Hb <10 g/dL), hypercalcemia (Ca >12 mg/dL), renal dysfunction (Cr >2 mg/dL or eGFR <40 mL/min/1.73 m²), known osteolytic lesions, a FLC ratio >100 or <0.01, IgM MGUS, known NHL, MM or AL Amyloidosis. The predicted risk of ≥10% BMPC was calculated using the iStopMM online tool. We assessed the model's discrimination (AUC), calibration (slope, intercept, Brier score and Hosmer-Lemeshow goodness of fit test [HL]), and clinical utility via decision curve analysis (DCA) compared to the Mayo Clinic risk model. Subgroup performance was analyzed by race and ethnicity.

Results: Our cohort included 230 patients from Weill Cornell Medicine or the University of Colorado Anschutz Medical Campus, of which 154 patients (67%) were White (NHW), 42 (18%) were Back (NHB), and 16 (7%) were Hispanic (H). A total of 138 patients (60%) had ≥10% BMPC. The iStopMM model demonstrated good discrimination (AUC = 0.71; 95%CI: 0.64-0.78) but poor external calibration (slope = 0.32; intercept = 0.48; Brier score = 0.26; HL p-value < 2.2e-16). The iStopMM model outperformed the Mayo Clinic model in discrimination (AUC 0.71 vs 0.59, p=0.0003). Decision curve analysis showed a slightly higher net benefit for the iStopMM model compared to the Mayo Clinic model (0.51 vs 0.48). Subgroup analysis revealed an AUC of 0.74, 0.64 and 0.73 for H, NHB and NHW patients, respectively, with poor calibration across all groups.

Conclusions: The iStopMM model demonstrated reduced performance in this US cohort, with notably lower accuracy in NHB individuals. These findings highlight the need for larger, multi-center validation and potential recalibration of the model to improve its accuracy and generalizability, particularly within minority populations. Study limitations include its retrospective design, two-center data collection, and a relatively small sample size of patients from underrepresented racial and ethnic groups.

This content is only available as a PDF.
Sign in via your Institution